This RMarkdown document is part of the Generic Skills Component (GSK) of the Course of the Foundation Studies Programme at Srishti Manipal Institute of Art, Design, and Technology, Bangalore India. The material is based on A Layered Grammar of Graphics by Hadley Wickham. The course is meant for First Year students pursuing a Degree in Art and Design.
The intent of this GSK part is to build Skill in coding in R, and also appreciate R as a way to metaphorically visualize information of various kinds, using predominantly geometric figures and structures.
All RMarkdown files combine code, text, web-images, and figures developed using code. Everything is text; code chunks are enclosed in fences (```)
2 Goals
At the end of this Lab session, we should:
- know the types and structures of network data and be able to work with them
- understand the basics of modern network packages in R
- be able to create network visualizations using tidygraph, ggraph( static visualizations ) and visNetwork (interactive visualizations)
- see directions for how the network metaphor applies in a variety of domains (e.g. biology/ecology, ideas/influence, technology, transportation, to name a few)
PREDICT Inspect the code and guess at what the code might do, write predictions
RUN the code provided and check what happens
INFER what the parameters of the code do and write comments to explain. What bells and whistles can you see?
MODIFY the parameters code provided to understand the options available. Write comments to show what you have aimed for and achieved.
MAKE : take an idea/concept of your own, and graph it.
4 Set Up
The setup code chunk below brings into our coding session R packages that provide specific computational abilities and also datasets which we can use.
To reiterate: Packages and datasets are not the same thing !! Packages are (small) collections of programs. Datasets are just….information.
5 Graph Metaphors
Network graphs are characterized by two key terms: nodes and edges
Nodes : Entities
Metaphors: Individual People? Things? Ideas? Places? to be connected in the network.
Synonyms: vertices. Nodes have IDs.
Edges: Connections
Metaphors: Interactions? Relationships? Influence? Letters sent and received? Dependence? between the entities.
Synonyms: links, ties.
In R, we create network representations using node and edge information. One way in which these could be organized are:
- Node list: a data frame with a single column listing the node IDs found in the edge list. You can also add attribute columns to the data frame such as the names of the nodes or grouping variables. ( Type? Class? Family? Country? Subject? Race? )
Edge list: data frame containing two columns: source node and destination node of an edge. Source and Destination have node IDs.
Weighted network graph: An edge list can also contain additional columns describing attributes of the edges such as a magnitude aspect for an edge. If the edges have a magnitude attribute the graph is considered weighted.
Edges Table
From
To
Relationship
Weightage
1
3
Financial Dealings
6
2
1
History Lessons
2
2
3
Vaccination
15
Layout: A geometric arrangement of nodes and edges.
Metaphors: Location? Spacing? Distance? Coordinates? Colour? Shape? Size? Provides visual insight due to the arrangement.
Layout Algorithms : Method to arranges nodes and edges with the aim of optimizing some metric .
Metaphors: Nodes are masses and edges are springs. The Layout algorithm minimizes the stretching and compressing of all springs.(BTW, are the Spring Constants K the same for all springs?…)
Directed and undirected network graph: If the distinction between source and target is meaningful, the network is directed. If the distinction is not meaningful, the network is undirected. Directed edges represent an ordering of nodes, like a relationship extending from one node to another, where switching the direction would change the structure of the network. Undirected edges are simply links between nodes where order does not matter.
Examples:
The World Wide Web is an example of a directed network because hyperlinks connect one Web page to another, but not necessarily the other way around.
Co-authorship networks represent examples of un-directed networks, where nodes are authors and they are connected by an edge if they have written a publication together
When people send e-mail to each other, the distinction between the sender (source) and the recipient (target) is clearly meaningful, therefore the network is directed.
Connected and Disconnected graphs: If there is some path from any node to any other node, the Networks is said to be Connected. Else, Disconnected.
6 Predict/Run/Infer -1
6.1 Using tidygraph and ggraph
tidygraph and ggraph are modern R packages for network data. Graph Data setup and manipulation is done in tidygraph and graph visualization with ggraph.
tidygraph Data -> “Network Object” in R.
ggraph Network Object -> Plots using a chosen layout/algo.
Both leverage the power of igraph, which is the Big Daddy of all network packages. We will be using the Grey’s Anatomy dataset in our first foray into networks.
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
grey_edges <- read_csv("./Data/greys-edges.csv")
## Rows: 57 Columns: 4
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (3): from, to, type
## dbl (1): weight
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
grey_nodes
## # A tibble: 54 x 7
## name sex race birthyear position season sign
## <chr> <chr> <chr> <dbl> <chr> <dbl> <chr>
## 1 Addison Montgomery F White 1967 Attending 1 Libra
## 2 Adele Webber F Black 1949 Non-Staff 2 Leo
## 3 Teddy Altman F White 1969 Attending 6 Pisces
## 4 Amelia Shepherd F White 1981 Attending 7 Libra
## 5 Arizona Robbins F White 1976 Attending 5 Leo
## 6 Rebecca Pope F White 1975 Non-Staff 3 Gemini
## 7 Jackson Avery M Black 1981 Resident 6 Leo
## 8 Miranda Bailey F Black 1969 Attending 1 Virgo
## 9 Ben Warren M Black 1972 Other 6 Aquarius
## 10 Henry Burton M White 1972 Non-Staff 7 Cancer
## # ... with 44 more rows
grey_edges
## # A tibble: 57 x 4
## from to weight type
## <chr> <chr> <dbl> <chr>
## 1 Leah Murphy Arizona Robbins 2 friends
## 2 Leah Murphy Alex Karev 4 benefits
## 3 Lauren Boswell Arizona Robbins 1 friends
## 4 Arizona Robbins Callie Torres 1 friends
## 5 Callie Torres Erica Hahn 6 friends
## 6 Callie Torres Alex Karev 12 benefits
## 7 Callie Torres Mark Sloan 5 professional
## 8 Callie Torres George O'Malley 2 professional
## 9 George O'Malley Izzie Stevens 3 professional
## 10 George O'Malley Meredith Grey 4 friends
## # ... with 47 more rows
## Warning: The `path` argument of `write_csv2()` is deprecated as of readr 1.4.0.
## Please use the `file` argument instead.
Questions and Inferences #1:
Look at the console output thumbnail. What does for example name = col_character mean? What attributes (i.e. extra information) are seen for Nodes and Edges? Understand the data in both nodes and edges as shown in the second and third thumbnails. Write some comments and inferences here.
6.3 Step 2.Create a network object using tidygraph:
Key function:
tbl_graph(): (aka “tibble graph”). Key arguments: nodes, edges and directed. Note this is a very versatile command and can take many input forms, such as data structures that result from other packages. Type ?tbl_graph in the Console and see the Usage section.
ga <- tbl_graph(nodes = grey_nodes,
edges = grey_edges,
directed = FALSE)
ga
## # A tbl_graph: 54 nodes and 57 edges
## #
## # An undirected simple graph with 4 components
## #
## # Node Data: 54 x 7 (active)
## name sex race birthyear position season sign
## <chr> <chr> <chr> <dbl> <chr> <dbl> <chr>
## 1 Addison Montgomery F White 1967 Attending 1 Libra
## 2 Adele Webber F Black 1949 Non-Staff 2 Leo
## 3 Teddy Altman F White 1969 Attending 6 Pisces
## 4 Amelia Shepherd F White 1981 Attending 7 Libra
## 5 Arizona Robbins F White 1976 Attending 5 Leo
## 6 Rebecca Pope F White 1975 Non-Staff 3 Gemini
## # ... with 48 more rows
## #
## # Edge Data: 57 x 4
## from to weight type
## <int> <int> <dbl> <chr>
## 1 5 47 2 friends
## 2 21 47 4 benefits
## 3 5 46 1 friends
## # ... with 54 more rows
Questions and Inferences #2:
Questions and Inferences: What information does the graph object contain? What attributes do the nodes have? What about the edges?
6.4 Step 3. Plot using ggraph
3a. Quick Plot: autograph() This is to check quickly is the data is imported properly and to decide upon going on to a more elaborate plotting.
autograph(ga)
Questions and Inferences #3:
Questions and Inferences: Describe this graph, in simple words here. Try to use some of the new domain words we have just acquired: nodes/edges, connected/disconnected, directed/undirected.
3b. More elaborate plot
Key functions:
ggraph(layout = "......"): Create classic node-edge diagrams; i.e. Sets up the graph. Rather like ggplot for networks!
Two kinds of geom: one set for nodes, and another for edges
geom_node_point(aes(.....)): Draws node as “points”. Alternatives are circle / arc_bar / tile / voronoi. Remember the geoms that we have seen before in Grammar of Graphics!
geom_edge_link(aes(.....)): Draws edges as “links”. Alternatives are arc / bend / elbow / hive / loop / parallel / diagonal / point / span /tile.
geom_node_text(aes(label = ......), repel = TRUE): Adds text labels (non-overlapping). Alternatives are label /...
labs(title = "....", subtitle = "....", caption = "...."): Change main titles, axis labels and legend titles. We know this from our work with ggplot.
# Write Comments next to each line
# About what that line does for the overall graph
ggraph(graph = ga, layout = "kk") +
#
geom_edge_link(width = 2, color = "pink") +
#
geom_node_point(
shape = 21,
size = 8,
fill = "blue",
color = "green",
stroke = 2
) +
#
labs(title = "Whoo Hoo! My first silly Grey's Anatomy graph in R!",
subtitle = "Why did Ramesh put me in this course...",
caption = "Bro, they are doing **cool** things in the other
classes...")
Questions and Inferences #3:
Questions and Inferences: What parameters have been changed here, compared to the earlier graph? Where do you see these changes in the code above?
Let us Play with this graph and see if we can make some small changes. Colour? Fill? Width? Size? Stroke? Labs? Of course!
# Change the parameters in each of the commands here to new ones
# Use fixed values for colours or sizes...etc.
ggraph(graph = ga, layout = "kk") +
geom_edge_link(width = 2) +
geom_node_point(shape = 21, size = 8,
fill = "blue",
color = "green",
stroke = 2) +
labs(title = "Whoo Hoo! My next silly Grey's Anatomy graph in R!",
subtitle = "Why did Ramesh put me in this course...",
caption = "Bro, they are doing cool things in the other
classes...")
Questions and Inferences #4:
Questions and Inferences: What did the shape parameter achieve? What are the possibilities with shape? How about including alpha?
3c. Aesthetic Mapping from Node and Edge attribute columns
Up to now, we have assignedspecific numbers to geometric aesthetics such as shape and size. Now we are ready ( maybe ?) change the meaning and significance of the entire graph and each element within it, and use aesthetics / metaphoric mappings to achieve new meanings or insights. Let us try using aes() inside each geom to map a variable to a geometric aspect.
Don’t try to use more than 2 aesthetic mappings simultaneously!!
The node elements we can tweak are:
Types of Nodes: geom_node_****()
Node Parameters: inside geom_node_****(aes(...............))
-aes(alpha = node-variable) : opacity; a value between 0 and 1
-aes(shape = node-variable) : node shape
-aes(colour = node-variable) : node colour
-aes(fill = node-variable) : fill colour for node
-aes(size = node-variable) : size of node
The edge elements we can tweak are:
Type of Edges" geom_edge_****()
Edge Parameters: inside geom_edge_****(aes(...............))
-aes(colour = edge-variable) : colour of the edge
-aes(width = edge-variable) : width of the edge
-aes(label = some_variable) : labels for the edge
Type ?geom_node_point and ?geom-edge_link in your Console for more information.
ggraph(graph = ga, layout = "fr") +
geom_edge_link0() + # add mapping here
geom_node_point() + # add mapping here
geom_node_label(aes(label = name), # modify this mapping
repel = TRUE, max.overlaps = 20,
alpha = 0.6,
size = 3) +
labs(title = "Whoo Hoo! Yet another Grey's Anatomy graph in R!")
Questions and Inferences #5:
Questions and Inferences: Describe some of the changes here. What types of edges worked? Which variables were you able to use for nodes and edges and how? What did not work with either of the two?
Questions and Inferences: How does this graph look “metaphorically” different? Do you see a difference in the relationships between people here? Why?
8 Hierarchical layouts
These provide for some alternative metaphorical views of networks. Note that not all layouts are possible for all datasets!!
# setting theme_graph
set_graph_style()
# This dataset contains the graph that describes the class
# hierarchy for the Flare visualization library.
# Type ?flare in your Console
head(flare$vertices)
Questions and Inferences: Does splitting up the main graph into subnetworks give you more insight? Describe some of these.
10 Network analysis with tidygraph
The data frame graph representation can be easily augmented with metrics or statistics computed on the graph. Remember how we computed counts with the penguin dataset in Grammar of Graphics.
Before computing a metric on nodes or edges use the activate() function to activate either node or edge data frames. Use dplyrverbs (filter, arrange, mutate) to achieve your computation in the proper way.
10.1 Network Centrality
Centrality is a an “ill-defined” metric of node and edge importance in a network. It is therefore calculated in many ways. Type ?centrality in your Console.
Standards
Let’s add a few columns to the nodes and edges based on network centrality measures:
## # A tbl_graph: 54 nodes and 57 edges
## #
## # An undirected simple graph with 4 components
## #
## # Edge Data: 57 x 5 (active)
## from to weight type betweenness
## <int> <int> <dbl> <chr> <dbl>
## 1 5 47 2 friends 20.3
## 2 21 47 4 benefits 44.7
## 3 5 46 1 friends 39
## 4 5 41 1 friends 66.3
## 5 18 41 6 friends 39
## 6 21 41 12 benefits 91.5
## # ... with 51 more rows
## #
## # Node Data: 54 x 8
## name sex race birthyear position season sign degree
## <chr> <chr> <chr> <dbl> <chr> <dbl> <chr> <dbl>
## 1 Addison Montgomery F White 1967 Attending 1 Libra 3
## 2 Adele Webber F Black 1949 Non-Staff 2 Leo 1
## 3 Teddy Altman F White 1969 Attending 6 Pisces 4
## # ... with 51 more rows
Packages tidygraph and ggraph can be pipelined to perform analysis and visualization tasks in one go.
# setting theme_graph
set_graph_style()
ga %>%
activate(nodes) %>%
# Who has the most connections?
mutate(degree = centrality_degree()) %>%
activate(edges) %>%
# Who is the go-through person?
mutate(betweenness = centrality_edge_betweenness()) %>%
# Now to continue with plotting
ggraph(layout = "nicely") +
geom_edge_link(aes(alpha = betweenness)) +
geom_node_point(aes(size = degree, colour = degree)) +
# discrete colour legend
scale_color_gradient(guide = "legend")
# or even less typing
ggraph(ga,layout = "nicely") +
geom_edge_link(aes(alpha = centrality_edge_betweenness())) +
geom_node_point(aes(colour = centrality_degree(),
size = centrality_degree())) +
scale_color_gradient(guide = "legend",
low = "green",
high = "red")
Questions and Inferences #10:
Questions and Inferences: How do the Centrality Measures show up in the graph? Would you “agree” with the way we have done it? Try to modify the aesthetics by copy-pasting this chunk below and see how you can make an alternative representation.
10.2 Analyse and visualize network: communities
Who is close to whom? Which are the groups you can see?
Questions and Inferences: Is the Community depiction clear? How would you do it, with which aesthetic? Copy Paste this chunk below and try.
11 Interactive Graphs with visNetwork
Exploring the VisNetwork package. Make graphs wiggle and shake using tidy commands! The package implements interactivity using the physical metaphor of weights and springs we discussed earlier.
The visNetwork() function uses a nodes list and edges list to create an interactive graph. The nodes list must include an “id” column, and the edge list must have “from” and “to” columns. The function also plots the labels for the nodes, using the names of the cities from the “label” column in the node list.
library(visNetwork)
# Prepare the data for plotting by visNetwork
grey_nodes
## # A tibble: 54 x 7
## name sex race birthyear position season sign
## <chr> <chr> <chr> <dbl> <chr> <dbl> <chr>
## 1 Addison Montgomery F White 1967 Attending 1 Libra
## 2 Adele Webber F Black 1949 Non-Staff 2 Leo
## 3 Teddy Altman F White 1969 Attending 6 Pisces
## 4 Amelia Shepherd F White 1981 Attending 7 Libra
## 5 Arizona Robbins F White 1976 Attending 5 Leo
## 6 Rebecca Pope F White 1975 Non-Staff 3 Gemini
## 7 Jackson Avery M Black 1981 Resident 6 Leo
## 8 Miranda Bailey F Black 1969 Attending 1 Virgo
## 9 Ben Warren M Black 1972 Other 6 Aquarius
## 10 Henry Burton M White 1972 Non-Staff 7 Cancer
## # ... with 44 more rows
grey_edges
## # A tibble: 57 x 4
## from to weight type
## <chr> <chr> <dbl> <chr>
## 1 Leah Murphy Arizona Robbins 2 friends
## 2 Leah Murphy Alex Karev 4 benefits
## 3 Lauren Boswell Arizona Robbins 1 friends
## 4 Arizona Robbins Callie Torres 1 friends
## 5 Callie Torres Erica Hahn 6 friends
## 6 Callie Torres Alex Karev 12 benefits
## 7 Callie Torres Mark Sloan 5 professional
## 8 Callie Torres George O'Malley 2 professional
## 9 George O'Malley Izzie Stevens 3 professional
## 10 George O'Malley Meredith Grey 4 friends
## # ... with 47 more rows
## # A tibble: 54 x 8
## id label group race birthyear position season sign
## <int> <chr> <chr> <chr> <dbl> <chr> <dbl> <chr>
## 1 1 Addison Montgomery Female White 1967 Attending 1 Libra
## 2 2 Adele Webber Female Black 1949 Non-Staff 2 Leo
## 3 3 Teddy Altman Female White 1969 Attending 6 Pisces
## 4 4 Amelia Shepherd Female White 1981 Attending 7 Libra
## 5 5 Arizona Robbins Female White 1976 Attending 5 Leo
## 6 6 Rebecca Pope Female White 1975 Non-Staff 3 Gemini
## 7 7 Jackson Avery Male Black 1981 Resident 6 Leo
## 8 8 Miranda Bailey Female Black 1969 Attending 1 Virgo
## 9 9 Ben Warren Male Black 1972 Other 6 Aquarius
## 10 10 Henry Burton Male White 1972 Non-Staff 7 Cancer
## # ... with 44 more rows
Some idea of interactivity and controls with visNetwork:
library(visNetwork)
# let's look again at the data
starwars_nodes <- read_csv("./Data/star-wars-network-nodes.csv")
## Rows: 22 Columns: 2
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (1): name
## dbl (1): id
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
# We need to rename starwars nodes dataframe and edge dataframe columns for visNetwork
starwars_nodes_vis <-
starwars_nodes %>%
rename("label" = name)
# Convert from and to columns to **node ids**
starwars_edges_vis <-
starwars_edges %>%
# Matching Source <- Source Node id ("id.x")
left_join(., starwars_nodes_vis, by = c("source" = "label")) %>%
# Matching Target <- Target Node id ("id.y")
left_join(., starwars_nodes_vis, by = c("target" = "label")) %>%
# Select "id.x" and "id.y" ONLY
# Rename them as "from" and "to"
# keep "weight" column for aesthetics of edges
select("from" = id.x, "to" = id.y, "value" = weight)
# Check everything once
starwars_nodes_vis
## # A tibble: 22 x 2
## label id
## <chr> <dbl>
## 1 R2-D2 0
## 2 CHEWBACCA 1
## 3 C-3PO 2
## 4 LUKE 3
## 5 DARTH VADER 4
## 6 CAMIE 5
## 7 BIGGS 6
## 8 LEIA 7
## 9 BERU 8
## 10 OWEN 9
## # ... with 12 more rows
Note that this is not a set of nodes, nor edges, but already a graph-object!
So no need to create a graph object using tbl_graph.
You will need to just go ahead and plot using ggraph.
Game of Thrones:
Start with pulling this data into your Rmarkdown:
GoT <- read_rds("./data/GoT.RDS")
## Warning: The `path` argument of `write_rds()` is deprecated as of readr 1.4.0.
## Please use the `file` argument instead.
Note that this is a list of 7 graphs from Game of Thrones.
Select one using GoT[[index]] where index = 1…7 and then plot directly.
Try to access the nodes and edges and modify them using any attribute data
Any other graph dataset from igraphdata (type ?igraphdata in console)
Ask me for help if you need any
12.3 Make-2: Literary Network with TV Show / Book / Story / Play
This is in groups. Groups of 4. To be announced
You need to create a Network Graph for your favourite Book, play, TV serial or Show. (E.g. Friends, BBT, or LB or HIMYM…or Hamlet, Little Women , Pride and Prejudice, or LoTR)
Step 1. Go to: Literary Networks for instructions. (Instructions are on also Teams -> Files.)
Step 2. Make your data using the instructions.
In the nodes excel, use id and names as your columns. Any other details in other columns to the right.
In your edges excel, use from and to are your first columns. Entries in these columns can be names or ids but be consistent and don’t mix.
Step 3. Decide on 3 answers that you to seek and plan to make graphs for.
Step 4. Create graph objects. Say 3 visualizations.
Step 5. Write comments/answers in the code and narrative text. Add pictures from the web using Markdown syntax.
Step 6. Write Reflection ( ok, a short one!) inside your RMarkdown. Make sure it knits!!
Step 7. Group Submission: Submit the knittable .Rmd fileAND the data. RMarkdown with joint authorship. Each person submits on their Assignments. All get the same grade on this one.
Ask me for clarifications on what to do after you have read the Instructions in your group.